我们介绍从单个视频帧预测的问题,从单个视频帧,包括实际瞬时光流的光流量的低维子空间。我们展示了几种自然场景假设如何通过差异和对象实例的表示,通过一组基流字段来识别适当的流子空间。流量子空间与新颖的丢失函数一起可用于预测单眼深度或预测深度加上对象实例嵌入的任务。这提供了一种新方法,可以使用单眼输入视频以无监督的方式学习这些任务,而无需相机内在或姿势。
translated by 谷歌翻译
积极的数据增强是视觉变压器(VIT)的强大泛化能力的关键组成部分。一种这样的数据增强技术是对抗性培训;然而,许多先前的作品表明,这通常会导致清洁的准确性差。在这项工作中,我们展示了金字塔对抗训练,这是一种简单有效的技术来提高韦维尔的整体性能。我们将其与“匹配”辍学和随机深度正则化配对,这采用了干净和对抗样品的相同辍学和随机深度配置。类似于Advprop的CNNS的改进(不直接适用于VIT),我们的金字塔对抗性训练会破坏分销准确性和vit和相关架构的分配鲁棒性之间的权衡。当Imagenet-1K数据训练时,它导致ImageNet清洁准确性的182美元的vit-B模型的精确度,同时由7美元的稳健性指标同时提高性能,从$ 1.76 \%$至11.45 \%$。我们为Imagenet-C(41.4 MCE),Imagenet-R($ 53.92 \%$),以及Imagenet-Sketch(41.04美元\%$)的新的最先进,只使用vit-b / 16骨干和我们的金字塔对抗训练。我们的代码将在接受时公开提供。
translated by 谷歌翻译
The cooperation of a human pilot with an autonomous agent during flight control realizes parallel autonomy. A parallel-autonomous system acts as a guardian that significantly enhances the robustness and safety of flight operations in challenging circumstances. Here, we propose an air-guardian concept that facilitates cooperation between an artificial pilot agent and a parallel end-to-end neural control system. Our vision-based air-guardian system combines a causal continuous-depth neural network model with a cooperation layer to enable parallel autonomy between a pilot agent and a control system based on perceived differences in their attention profile. The attention profiles are obtained by computing the networks' saliency maps (feature importance) through the VisualBackProp algorithm. The guardian agent is trained via reinforcement learning in a fixed-wing aircraft simulated environment. When the attention profile of the pilot and guardian agents align, the pilot makes control decisions. If the attention map of the pilot and the guardian do not align, the air-guardian makes interventions and takes over the control of the aircraft. We show that our attention-based air-guardian system can balance the trade-off between its level of involvement in the flight and the pilot's expertise and attention. We demonstrate the effectivness of our methods in simulated flight scenarios with a fixed-wing aircraft and on a real drone platform.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
With the progress of sensor technology in wearables, the collection and analysis of PPG signals are gaining more interest. Using Machine Learning, the cardiac rhythm corresponding to PPG signals can be used to predict different tasks such as activity recognition, sleep stage detection, or more general health status. However, supervised learning is often limited by the amount of available labeled data, which is typically expensive to obtain. To address this problem, we propose a Self-Supervised Learning (SSL) method with a pretext task of signal reconstruction to learn an informative generalized PPG representation. The performance of the proposed SSL framework is compared with two fully supervised baselines. The results show that in a very limited label data setting (10 samples per class or less), using SSL is beneficial, and a simple classifier trained on SSL-learned representations outperforms fully supervised deep neural networks. However, the results reveal that the SSL-learned representations are too focused on encoding the subjects. Unfortunately, there is high inter-subject variability in the SSL-learned representations, which makes working with this data more challenging when labeled data is scarce. The high inter-subject variability suggests that there is still room for improvements in learning representations. In general, the results suggest that SSL may pave the way for the broader use of machine learning models on PPG data in label-scarce regimes.
translated by 谷歌翻译
The field of cybersecurity is evolving fast. Experts need to be informed about past, current and - in the best case - upcoming threats, because attacks are becoming more advanced, targets bigger and systems more complex. As this cannot be addressed manually, cybersecurity experts need to rely on machine learning techniques. In the texutual domain, pre-trained language models like BERT have shown to be helpful, by providing a good baseline for further fine-tuning. However, due to the domain-knowledge and many technical terms in cybersecurity general language models might miss the gist of textual information, hence doing more harm than good. For this reason, we create a high-quality dataset and present a language model specifically tailored to the cybersecurity domain, which can serve as a basic building block for cybersecurity systems that deal with natural language. The model is compared with other models based on 15 different domain-dependent extrinsic and intrinsic tasks as well as general tasks from the SuperGLUE benchmark. On the one hand, the results of the intrinsic tasks show that our model improves the internal representation space of words compared to the other models. On the other hand, the extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model is best in specific application scenarios, in contrast to the others. Furthermore, we show that our approach against catastrophic forgetting works, as the model is able to retrieve the previously trained domain-independent knowledge. The used dataset and trained model are made publicly available
translated by 谷歌翻译
Bayesian optimization (BO) is increasingly employed in critical applications such as materials design and drug discovery. An increasingly popular strategy in BO is to forgo the sole reliance on high-fidelity data and instead use an ensemble of information sources which provide inexpensive low-fidelity data. The overall premise of this strategy is to reduce the overall sampling costs by querying inexpensive low-fidelity sources whose data are correlated with high-fidelity samples. Here, we propose a multi-fidelity cost-aware BO framework that dramatically outperforms the state-of-the-art technologies in terms of efficiency, consistency, and robustness. We demonstrate the advantages of our framework on analytic and engineering problems and argue that these benefits stem from our two main contributions: (1) we develop a novel acquisition function for multi-fidelity cost-aware BO that safeguards the convergence against the biases of low-fidelity data, and (2) we tailor a newly developed emulator for multi-fidelity BO which enables us to not only simultaneously learn from an ensemble of multi-fidelity datasets, but also identify the severely biased low-fidelity sources that should be excluded from BO.
translated by 谷歌翻译
The ongoing transition from a linear (produce-use-dispose) to a circular economy poses significant challenges to current state-of-the-art information and communication technologies. In particular, the derivation of integrated, high-level views on material, process, and product streams from (real-time) data produced along value chains is challenging for several reasons. Most importantly, sufficiently rich data is often available yet not shared across company borders because of privacy concerns which make it impossible to build integrated process models that capture the interrelations between input materials, process parameters, and key performance indicators along value chains. In the current contribution, we propose a privacy-preserving, federated multivariate statistical process control (FedMSPC) framework based on Federated Principal Component Analysis (PCA) and Secure Multiparty Computation to foster the incentive for closer collaboration of stakeholders along value chains. We tested our approach on two industrial benchmark data sets - SECOM and ST-AWFD. Our empirical results demonstrate the superior fault detection capability of the proposed approach compared to standard, single-party (multiway) PCA. Furthermore, we showcase the possibility of our framework to provide privacy-preserving fault diagnosis to each data holder in the value chain to underpin the benefits of secure data sharing and federated process modeling.
translated by 谷歌翻译
线性状态空间模型(SSM)的状态过渡矩阵的适当参数化,然后是标准非线性,使他们能够从顺序数据中有效地学习表示形式,从。在本文中,我们表明,当线性液体时恒定(LTC)状态空间模型给出诸如S4之类的结构SSM时,我们可以进一步改善。 LTC神经网络是带有输入依赖性状态过渡模块的因果连续神经网络,这使他们学会在推理时适应传入的输入。我们表明,通过使用对角和S4中引入的状态过渡矩阵的对角线加低级分解以及一些简化的基于LTC的结构状态空间模型(称为Liquid-S4)实现了新的最新最先进的最先进跨序列建模任务具有长期依赖性(例如图像,文本,音频和医疗时间序列)的艺术概括,在远程竞技场基准中的平均性能为87.32%。在完整的原始语音命令识别中,数据集Liquid-S4的精度达到96.78%,与S4相比,参数计数降低了30%。性能的额外增益是液体-S4的核结构的直接结果,该结构考虑了训练和推理过程中输入序列样本的相似性。
translated by 谷歌翻译
与高维数据集的探索性分析(例如主成分分析(PCA))相反,邻居嵌入(NE)技术倾向于更好地保留高维数据的局部结构/拓扑。然而,保留局部结构的能力是以解释性为代价的:诸如T-分布的随机邻居嵌入(T-SNE)或统一的歧管近似和投影(UMAP)等技术没有提供拓扑结构的介绍(UMAP)(UMAP)(UMAP)(UMAP)(UMAP)(UMAP)(UMAP)。在相应的嵌入中看到的群集)结构。在这里,我们提出了基于PCA,Q-残基和Hotelling的T2贡献的化学计量学领域的不同“技巧”,并结合了新型可视化方法,从而得出了邻居嵌入的局部和全局解释。我们展示了我们的方法如何使用标准的单变量或多变量方法来识别数据点组之间的歧视性特征。
translated by 谷歌翻译